Compte Rendu TP2 - Process Mining

Elaboré par :

Youssef Skouri
Baha Khayati
Taysir Fattoumi

Problem process analysis

When we talk about “process analysis” we think, is there any alternative instead of relying solely on workshops, interviews or outdated process documents?. Especially when our world today is heavily reliant on computing technology it would only makes sense if there is a more effective and reliable solution to do the trick.

Solution: “Process Mining”

what is process mining ? (http://www.processmining.org/research/start)

Process mining techniques allow for extracting information from event logs. For example, the audit trails of a workflow management system or the transaction logs of an enterprise resource planning system can be used to discover models describing processes, organizations, and products. Moreover, it is possible to use process mining to monitor deviations (e.g., comparing the observed events with predefined models or business rules in the context of SOX)

Libraries

library(bupaR)
## Warning: package 'bupaR' was built under R version 3.4.4
## Loading required package: edeaR
## Warning: package 'edeaR' was built under R version 3.4.4
## Loading required package: eventdataR
## Warning: package 'eventdataR' was built under R version 3.4.4
## Loading required package: processmapR
## Warning: package 'processmapR' was built under R version 3.4.4
## Loading required package: xesreadR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'xesreadR'
## Loading required package: processmonitR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'processmonitR'
## Loading required package: petrinetR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'petrinetR'
## 
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:utils':
## 
##     timestamp
library(edeaR)
library(processmapR)
library(eventdataR)
library(readr)
## Warning: package 'readr' was built under R version 3.4.4
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.4
## -- Attaching packages ---------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.4.0
## v ggplot2 3.1.0     v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'tibble' was built under R version 3.4.4
## Warning: package 'tidyr' was built under R version 3.4.4
## Warning: package 'purrr' was built under R version 3.4.4
## Warning: package 'dplyr' was built under R version 3.4.4
## Warning: package 'stringr' was built under R version 3.4.4
## Warning: package 'forcats' was built under R version 3.4.4
## -- Conflicts ------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks bupaR::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()
library(DiagrammeR)
## Warning: package 'DiagrammeR' was built under R version 3.4.4
library(ggplot2)
library(stringr)
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.4.4
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

Data import and preparation

data = read.csv("credit_file +.csv",sep = ",",header = T)
data_act <- readr::read_csv("credit_file +.csv",
                         locale = locale(date_names = 'en',
                                         encoding = 'ISO-8859-1'))
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   Variant_index = col_double(),
##   `(case)_RequestedAmount` = col_double(),
##   Accepted = col_logical(),
##   CreditScore = col_double(),
##   FirstWithdrawalAmount = col_double(),
##   MonthlyCost = col_double(),
##   NumberOfTerms = col_double(),
##   OfferedAmount = col_double(),
##   Selected = col_logical(),
##   starttimestamp = col_datetime(format = ""),
##   endtimestamp = col_datetime(format = "")
## )
## See spec(...) for full column specifications.
## Warning: 36773 parsing failures.
## row col   expected    actual                file
## 174  -- 24 columns 1 columns 'credit_file +.csv'
## 175  -- 24 columns 1 columns 'credit_file +.csv'
## 176  -- 24 columns 1 columns 'credit_file +.csv'
## 177  -- 24 columns 1 columns 'credit_file +.csv'
## 178  -- 24 columns 1 columns 'credit_file +.csv'
## ... ... .......... ......... ...................
## See problems(...) for more details.
head(data)
##                 Case_ID               Activity Resource
## 1 Application_652823628   A_Create Application   User_1
## 2 Application_652823628            A_Submitted   User_1
## 3 Application_652823628              A_Concept   User_1
## 4 Application_652823628 W_Complete application  User_17
## 5 Application_652823628             A_Accepted  User_52
## 6 Application_652823628         O_Create Offer  User_52
##           Start_Timestamp      Complete_Timestamp   Variant Variant_index
## 1 2016/01/01 10:51:15.304 2016/01/01 10:51:15.304 Variant 2             2
## 2 2016/01/01 10:51:15.352 2016/01/01 10:51:15.352 Variant 2             2
## 3 2016/01/01 10:52:36.413 2016/01/01 10:52:36.413 Variant 2             2
## 4 2016/01/02 11:45:22.429 2016/01/02 11:45:22.429 Variant 2             2
## 5 2016/01/02 12:23:04.299 2016/01/02 12:23:04.299 Variant 2             2
## 6 2016/01/02 12:29:03.994 2016/01/02 12:29:03.994 Variant 2             2
##   X.case._ApplicationType       X.case._creditGoal X.case._RequestedAmount
## 1              New credit Existing credit takeover                   20000
## 2              New credit Existing credit takeover                   20000
## 3              New credit Existing credit takeover                   20000
## 4              New credit Existing credit takeover                   20000
## 5              New credit Existing credit takeover                   20000
## 6              New credit Existing credit takeover                   20000
##   Accepted      Action CreditScore               EventID EventOrigin
## 1     <NA>     Created          NA Application_652823628 Application
## 2     <NA> statechange          NA  ApplState_1582051990 Application
## 3     <NA> statechange          NA   ApplState_642383566 Application
## 4     <NA>    Obtained          NA   Workitem_1875340971    Workflow
## 5     <NA> statechange          NA    ApplState_99568828 Application
## 6     true     Created         979       Offer_148581083       Offer
##   FirstWithdrawalAmount MonthlyCost NumberOfTerms OfferID OfferedAmount
## 1                    NA          NA            NA    <NA>            NA
## 2                    NA          NA            NA    <NA>            NA
## 3                    NA          NA            NA    <NA>            NA
## 4                    NA          NA            NA    <NA>            NA
## 5                    NA          NA            NA    <NA>            NA
## 6                 20000      498.29            44    <NA>         20000
##   Selected lifecycle.transition       starttimestamp         endtimestamp
## 1     <NA>             complete 2016-01-01T09:51:15Z 2016-01-01T09:51:15Z
## 2     <NA>             complete 2016-01-01T09:51:15Z 2016-01-01T09:51:15Z
## 3     <NA>             complete 2016-01-01T09:52:36Z 2016-01-01T09:52:36Z
## 4     <NA>                start 2016-01-02T10:45:22Z 2016-01-02T10:45:22Z
## 5     <NA>             complete 2016-01-02T11:23:04Z 2016-01-02T11:23:04Z
## 6     true             complete 2016-01-02T11:29:03Z 2016-01-02T11:29:03Z
head(data_act)
## # A tibble: 6 x 24
##   Case_ID Activity Resource Start_Timestamp Complete_Timest~ Variant
##   <chr>   <chr>    <chr>    <chr>           <chr>            <chr>  
## 1 Applic~ A_Creat~ User_1   2016/01/01 10:~ 2016/01/01 10:5~ Varian~
## 2 Applic~ A_Submi~ User_1   2016/01/01 10:~ 2016/01/01 10:5~ Varian~
## 3 Applic~ A_Conce~ User_1   2016/01/01 10:~ 2016/01/01 10:5~ Varian~
## 4 Applic~ W_Compl~ User_17  2016/01/02 11:~ 2016/01/02 11:4~ Varian~
## 5 Applic~ A_Accep~ User_52  2016/01/02 12:~ 2016/01/02 12:2~ Varian~
## 6 Applic~ O_Creat~ User_52  2016/01/02 12:~ 2016/01/02 12:2~ Varian~
## # ... with 18 more variables: Variant_index <dbl>,
## #   `(case)_ApplicationType` <chr>, `(case)_creditGoal` <chr>,
## #   `(case)_RequestedAmount` <dbl>, Accepted <lgl>, Action <chr>,
## #   CreditScore <dbl>, EventID <chr>, EventOrigin <chr>,
## #   FirstWithdrawalAmount <dbl>, MonthlyCost <dbl>, NumberOfTerms <dbl>,
## #   OfferID <chr>, OfferedAmount <dbl>, Selected <lgl>,
## #   `lifecycle:transition` <chr>, starttimestamp <dttm>,
## #   endtimestamp <dttm>
#Changing date variables to appropriate types
data_act$starttimestamp = as.POSIXct(data_act$Start_Timestamp, format = "%Y/%m/%d %H:%M:%S")
data_act$endtimestamp = as.POSIXct(data_act$Complete_Timestamp, 
                               format = "%Y/%m/%d %H:%M:%S")
data$Complete_Timestamp=data_act$endtimestamp = as.POSIXct(data$Complete_Timestamp, 
                               format = "%Y/%m/%d %H:%M:%S")
data$Activity_Instance_ID = seq(1,nrow(data))#needded for eventlog
str(data)
## 'data.frame':    400000 obs. of  25 variables:
##  $ Case_ID                : Factor w/ 57165 levels "Application_1000086665,A_Accepted,User_5,2016/08/05 15:57:07.419,2016/08/05 15:57:07.419,Variant 1,1,New credit"| __truncated__,..: 47006 47006 47006 47006 47006 47006 47006 47006 47006 47006 ...
##  $ Activity               : Factor w/ 26 levels "","A_Accepted",..: 6 10 5 23 2 14 15 18 21 4 ...
##  $ Resource               : Factor w/ 132 levels "","User_1","User_10",..: 2 2 2 52 91 91 91 91 91 91 ...
##  $ Start_Timestamp        : Factor w/ 363222 levels "","2016/01/01 10:51:15.304",..: 2 3 4 125 147 158 159 160 161 162 ...
##  $ Complete_Timestamp     : POSIXct, format: "2016-01-01 10:51:15" "2016-01-01 10:51:15" ...
##  $ Variant                : Factor w/ 2917 levels "","Variant 1",..: 1031 1031 1031 1031 1031 1031 1031 1031 1031 1031 ...
##  $ Variant_index          : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ X.case._ApplicationType: Factor w/ 3 levels "","Limit raise",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ X.case._creditGoal     : Factor w/ 14 levels "","Boat","Business goal",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ X.case._RequestedAmount: num  20000 20000 20000 20000 20000 20000 20000 20000 20000 20000 ...
##  $ Accepted               : Factor w/ 3 levels "","false","true": NA NA NA NA NA 3 NA NA NA NA ...
##  $ Action                 : Factor w/ 5 levels "","Created","Deleted",..: 2 5 5 4 5 2 5 5 4 5 ...
##  $ CreditScore            : int  NA NA NA NA NA 979 NA NA NA NA ...
##  $ EventID                : Factor w/ 363228 levels "","Application_1000158214",..: 16706 60870 129762 318353 154191 161638 208779 235898 351445 150834 ...
##  $ EventOrigin            : Factor w/ 4 levels "","Application",..: 2 2 2 4 2 3 3 3 4 2 ...
##  $ FirstWithdrawalAmount  : num  NA NA NA NA NA 20000 NA NA NA NA ...
##  $ MonthlyCost            : num  NA NA NA NA NA ...
##  $ NumberOfTerms          : int  NA NA NA NA NA 44 NA NA NA NA ...
##  $ OfferID                : Factor w/ 28025 levels "","Offer_1000096910",..: NA NA NA NA NA NA 7178 7178 NA NA ...
##  $ OfferedAmount          : num  NA NA NA NA NA 20000 NA NA NA NA ...
##  $ Selected               : Factor w/ 3 levels "","false","true": NA NA NA NA NA 3 NA NA NA NA ...
##  $ lifecycle.transition   : Factor w/ 3 levels "","complete",..: 2 2 2 3 2 2 2 2 3 2 ...
##  $ starttimestamp         : Factor w/ 234909 levels "","2016-01-01T09:51:15Z",..: 2 2 3 84 99 108 109 110 110 110 ...
##  $ endtimestamp           : Factor w/ 220855 levels "","2016-01-01T09:51:15Z",..: 2 2 3 80 92 101 102 103 103 103 ...
##  $ Activity_Instance_ID   : int  1 2 3 4 5 6 7 8 9 10 ...
head(data)
##                 Case_ID               Activity Resource
## 1 Application_652823628   A_Create Application   User_1
## 2 Application_652823628            A_Submitted   User_1
## 3 Application_652823628              A_Concept   User_1
## 4 Application_652823628 W_Complete application  User_17
## 5 Application_652823628             A_Accepted  User_52
## 6 Application_652823628         O_Create Offer  User_52
##           Start_Timestamp  Complete_Timestamp   Variant Variant_index
## 1 2016/01/01 10:51:15.304 2016-01-01 10:51:15 Variant 2             2
## 2 2016/01/01 10:51:15.352 2016-01-01 10:51:15 Variant 2             2
## 3 2016/01/01 10:52:36.413 2016-01-01 10:52:36 Variant 2             2
## 4 2016/01/02 11:45:22.429 2016-01-02 11:45:22 Variant 2             2
## 5 2016/01/02 12:23:04.299 2016-01-02 12:23:04 Variant 2             2
## 6 2016/01/02 12:29:03.994 2016-01-02 12:29:03 Variant 2             2
##   X.case._ApplicationType       X.case._creditGoal X.case._RequestedAmount
## 1              New credit Existing credit takeover                   20000
## 2              New credit Existing credit takeover                   20000
## 3              New credit Existing credit takeover                   20000
## 4              New credit Existing credit takeover                   20000
## 5              New credit Existing credit takeover                   20000
## 6              New credit Existing credit takeover                   20000
##   Accepted      Action CreditScore               EventID EventOrigin
## 1     <NA>     Created          NA Application_652823628 Application
## 2     <NA> statechange          NA  ApplState_1582051990 Application
## 3     <NA> statechange          NA   ApplState_642383566 Application
## 4     <NA>    Obtained          NA   Workitem_1875340971    Workflow
## 5     <NA> statechange          NA    ApplState_99568828 Application
## 6     true     Created         979       Offer_148581083       Offer
##   FirstWithdrawalAmount MonthlyCost NumberOfTerms OfferID OfferedAmount
## 1                    NA          NA            NA    <NA>            NA
## 2                    NA          NA            NA    <NA>            NA
## 3                    NA          NA            NA    <NA>            NA
## 4                    NA          NA            NA    <NA>            NA
## 5                    NA          NA            NA    <NA>            NA
## 6                 20000      498.29            44    <NA>         20000
##   Selected lifecycle.transition       starttimestamp         endtimestamp
## 1     <NA>             complete 2016-01-01T09:51:15Z 2016-01-01T09:51:15Z
## 2     <NA>             complete 2016-01-01T09:51:15Z 2016-01-01T09:51:15Z
## 3     <NA>             complete 2016-01-01T09:52:36Z 2016-01-01T09:52:36Z
## 4     <NA>                start 2016-01-02T10:45:22Z 2016-01-02T10:45:22Z
## 5     <NA>             complete 2016-01-02T11:23:04Z 2016-01-02T11:23:04Z
## 6     true             complete 2016-01-02T11:29:03Z 2016-01-02T11:29:03Z
##   Activity_Instance_ID
## 1                    1
## 2                    2
## 3                    3
## 4                    4
## 5                    5
## 6                    6

Eventlog Creation with “eventlog”

eventlog = data %>% #a data.frame with the information in the table above
    eventlog(
        case_id = "Case_ID",
        activity_id = "Activity",
        activity_instance_id = "Activity_Instance_ID",
        lifecycle_id = "lifecycle.transition",
        timestamp = "Complete_Timestamp",
        resource_id = "Resource"
    )
## Warning: package 'bindrcpp' was built under R version 3.4.4
eventlog %>% summary
## Number of events:  400000
## Number of cases:  57165
## Number of traces:  4802
## Number of distinct activities:  26
## Average trace length:  6.997289
## 
## Start eventlog:  NA
## End eventlog:  NA
##    Case_ID                              Activity         Resource     
##  Length:400000                              : 36773   User_1 : 49376  
##  Class :character   O_Create Offer          : 28024          : 36773  
##  Mode  :character   O_Created               : 28024   User_49:  7778  
##                     O_Sent (mail and online): 25938   User_29:  6932  
##                     W_Validate application  : 25480   User_3 :  6805  
##                     A_Validating            : 25013   User_10:  6694  
##                     (Other)                 :230748   (Other):285642  
##                 Start_Timestamp   Complete_Timestamp           
##                         : 36773   Min.   :2016-01-01 10:51:15  
##  2016/01/08 19:56:43.212:     2   1st Qu.:2016-03-19 17:33:35  
##  2016/01/29 09:10:58.778:     2   Median :2016-06-07 10:39:49  
##  2016/03/02 15:15:40.745:     2   Mean   :2016-05-28 17:17:46  
##  2016/07/11 15:17:14.450:     2   3rd Qu.:2016-08-03 03:16:04  
##  2016/07/15 13:09:09.433:     2   Max.   :2017-01-26 10:11:10  
##  (Other)                :363217   NA's   :36773                
##       Variant       Variant_index    X.case._ApplicationType
##           : 36773   Min.   :   1.0              : 36773     
##  Variant 1: 27588   1st Qu.:   7.0   Limit raise: 36394     
##  Variant 2: 19570   Median :  25.0   New credit :326833     
##  Variant 3: 14685   Mean   : 390.3                          
##  Variant 5: 10878   3rd Qu.: 257.0                          
##  Variant 8: 10008   Max.   :3159.0                          
##  (Other)  :280498   NA's   :36773                           
##                 X.case._creditGoal X.case._RequestedAmount  Accepted     
##  Car                     :116956   Min.   :     0               : 36773  
##  Home improvement        : 95296   1st Qu.:  6000          false:  8359  
##  Existing credit takeover: 71011   Median : 12000          true : 19665  
##                          : 36773   Mean   : 15618          NA's :335203  
##  Unknown                 : 32069   3rd Qu.: 20000                        
##  Not speficied           : 14204   Max.   :450000                        
##  (Other)                 : 33691   NA's   :36773                         
##          Action        CreditScore                       EventID      
##             : 36773   Min.   :   0.0                         : 36773  
##  Created    : 48416   1st Qu.:   0.0   Application_1000158214:     1  
##  Deleted    : 27791   Median :   0.0   Application_1000311556:     1  
##  Obtained   : 54569   Mean   : 319.9   Application_1000339879:     1  
##  statechange:232451   3rd Qu.: 851.0   Application_100034150 :     1  
##                       Max.   :1142.0   Application_1000557783:     1  
##                       NA's   :371976   (Other)               :363222  
##       EventOrigin     FirstWithdrawalAmount  MonthlyCost    
##             : 36773   Min.   :    0         Min.   :  43.0  
##  Application:154460   1st Qu.:    0         1st Qu.: 150.0  
##  Offer      :126407   Median : 5000         Median : 232.8  
##  Workflow   : 82360   Mean   : 7681         Mean   : 273.9  
##                       3rd Qu.:10996         3rd Qu.: 340.8  
##                       Max.   :75000         Max.   :6673.8  
##                       NA's   :371976        NA's   :371976  
##  NumberOfTerms                OfferID       OfferedAmount   
##  Min.   :  5                      : 36773   Min.   : 5000   
##  1st Qu.: 56      Offer_1000226917:     4   1st Qu.: 8000   
##  Median : 74      Offer_1000329580:     4   Median :15000   
##  Mean   : 82      Offer_1000373613:     4   Mean   :17820   
##  3rd Qu.:120      Offer_1000572979:     4   3rd Qu.:24000   
##  Max.   :180      (Other)         : 98367   Max.   :75000   
##  NA's   :371976   NA's            :264844   NA's   :371976  
##   Selected      lifecycle.transition              starttimestamp  
##       : 36773           : 36773                          : 36773  
##  false: 13915   complete:308658      2016-03-28T12:15:55Z:    10  
##  true : 14109   start   : 54569      2016-02-16T16:25:24Z:     9  
##  NA's :335203                        2016-03-03T07:01:13Z:     9  
##                                      2016-03-09T08:44:36Z:     9  
##                                      2016-04-28T06:00:26Z:     9  
##                                      (Other)             :363181  
##                endtimestamp    Activity_Instance_ID     .order     
##                      : 36773   Length:400000        Min.   :1e+00  
##  2016-03-28T12:15:55Z:    11   Class :character     1st Qu.:1e+05  
##  2016-03-09T08:44:36Z:    10   Mode  :character     Median :2e+05  
##  2016-07-22T14:19:10Z:    10                        Mean   :2e+05  
##  2016-02-15T14:35:51Z:     9                        3rd Qu.:3e+05  
##  2016-02-16T16:25:24Z:     9                        Max.   :4e+05  
##  (Other)             :363178

Activity log with activities_to_eventlog (can take a vector of timestamps)

events <- bupaR::activities_to_eventlog(
  data_act,
  case_id = 'Case_ID',
  activity_id = 'Activity',
  resource_id = 'Resource',
  timestamps = c('starttimestamp', 'endtimestamp')
)
events %>% summary
## Number of events:  800000
## Number of cases:  57165
## Number of traces:  2917
## Number of distinct activities:  26
## Average trace length:  13.99458
## 
## Start eventlog:  NA
## End eventlog:  NA
##    Case_ID                              Activity         Resource     
##  Length:800000      O_Create Offer          : 56048   User_1 : 98752  
##  Class :character   O_Created               : 56048   User_49: 15556  
##  Mode  :character   O_Sent (mail and online): 51876   User_29: 13864  
##                     W_Validate application  : 50960   User_3 : 13610  
##                     A_Validating            : 50026   User_10: 13388  
##                     (Other)                 :461496   (Other):571284  
##                     NA's                    : 73546   NA's   : 73546  
##  Start_Timestamp    Complete_Timestamp   Variant          Variant_index   
##  Length:800000      Length:800000      Length:800000      Min.   :   1.0  
##  Class :character   Class :character   Class :character   1st Qu.:   7.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :  25.0  
##                                                           Mean   : 390.3  
##                                                           3rd Qu.: 257.0  
##                                                           Max.   :3159.0  
##                                                           NA's   :73546   
##  (case)_ApplicationType (case)_creditGoal  (case)_RequestedAmount
##  Length:800000          Length:800000      Min.   :     0        
##  Class :character       Class :character   1st Qu.:  6000        
##  Mode  :character       Mode  :character   Median : 12000        
##                                            Mean   : 15618        
##                                            3rd Qu.: 20000        
##                                            Max.   :450000        
##                                            NA's   :73546         
##   Accepted          Action           CreditScore       EventID         
##  Mode :logical   Length:800000      Min.   :   0.0   Length:800000     
##  FALSE:16718     Class :character   1st Qu.:   0.0   Class :character  
##  TRUE :39330     Mode  :character   Median :   0.0   Mode  :character  
##  NA's :743952                       Mean   : 319.9                     
##                                     3rd Qu.: 851.0                     
##                                     Max.   :1142.0                     
##                                     NA's   :743952                     
##  EventOrigin        FirstWithdrawalAmount  MonthlyCost    
##  Length:800000      Min.   :    0         Min.   :  43.0  
##  Class :character   1st Qu.:    0         1st Qu.: 150.0  
##  Mode  :character   Median : 5000         Median : 232.8  
##                     Mean   : 7681         Mean   : 273.9  
##                     3rd Qu.:10996         3rd Qu.: 340.8  
##                     Max.   :75000         Max.   :6673.8  
##                     NA's   :743952        NA's   :743952  
##  NumberOfTerms      OfferID          OfferedAmount     Selected      
##  Min.   :  5      Length:800000      Min.   : 5000    Mode :logical  
##  1st Qu.: 56      Class :character   1st Qu.: 8000    FALSE:27830    
##  Median : 74      Mode  :character   Median :15000    TRUE :28218    
##  Mean   : 82                         Mean   :17820    NA's :743952   
##  3rd Qu.:120                         3rd Qu.:24000                   
##  Max.   :180                         Max.   :75000                   
##  NA's   :743952                      NA's   :743952                  
##  lifecycle:transition activity_instance_id         lifecycle_id   
##  Length:800000        Length:800000        endtimestamp  :400000  
##  Class :character     Class :character     starttimestamp:400000  
##  Mode  :character     Mode  :character                            
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##    timestamp                       .order     
##  Min.   :2016-01-01 10:51:15   Min.   :1e+00  
##  1st Qu.:2016-03-19 16:11:02   1st Qu.:2e+05  
##  Median :2016-06-07 10:26:21   Median :4e+05  
##  Mean   :2016-05-28 15:39:43   Mean   :4e+05  
##  3rd Qu.:2016-08-02 20:05:06   3rd Qu.:6e+05  
##  Max.   :2017-01-26 10:11:10   Max.   :8e+05  
##  NA's   :73546

!! We are going to continue to work with activity logs (eventlog variable) !!

Activities frequencies

freq=activity_frequency(events,level = "activity")
freq2=activity_frequency(eventlog,level = "activity")
#?activity_frequency
plot(freq)

plot(freq2)

Filter only the prcoesses which has a certain activity (we chose the activity ‘A_Validating’)

With eventlog

eventlog %>% 
  filter_activity_presence(activities = c('A_Validating')) %>% 
  activity_frequency(level = "activity") 
## # A tibble: 25 x 3
##    Activity                 absolute relative
##    <fct>                       <int>    <dbl>
##  1 W_Validate application      25480   0.0904
##  2 A_Validating                25013   0.0887
##  3 O_Create Offer              19781   0.0701
##  4 O_Created                   19781   0.0701
##  5 O_Sent (mail and online)    18188   0.0645
##  6 O_Returned                  15130   0.0537
##  7 W_Call incomplete files     14457   0.0513
##  8 A_Incomplete                14335   0.0508
##  9 W_Call after offers         14228   0.0505
## 10 A_Accepted                  14178   0.0503
## # ... with 15 more rows

with events

events %>% 
  filter_activity_presence(activities = c('A_Validating')) %>% 
  activity_frequency(level = "activity") 
## # A tibble: 25 x 3
##    Activity                 absolute relative
##    <fct>                       <int>    <dbl>
##  1 W_Validate application      25480   0.0904
##  2 A_Validating                25013   0.0887
##  3 O_Create Offer              19781   0.0701
##  4 O_Created                   19781   0.0701
##  5 O_Sent (mail and online)    18188   0.0645
##  6 O_Returned                  15130   0.0537
##  7 W_Call incomplete files     14457   0.0513
##  8 A_Incomplete                14335   0.0508
##  9 W_Call after offers         14228   0.0505
## 10 A_Accepted                  14178   0.0503
## # ... with 15 more rows
events %>%
  filter_activity_frequency(percentage = 1.0) %>% #most frequent activities
  filter_trace_frequency(percentage = .80) %>%    #most frequent traces
  process_map(render = T)

We can do the exact same thing with eventlog variable

Frequencies matrix plot

plot(precedence_matrix(events,type = "absolute"))

plot(precedence_matrix(eventlog,type="absolute"))

Activities Trace exploring

trace_explorer(events,coverage = 0.8)
## Warning: Removed 1 rows containing missing values (geom_text).

trace_explorer(eventlog,coverage = 0.8)

Dotted chart

dotted_chart(eventlog)
## Joining, by = "Case_ID"
## Warning: Removed 36773 rows containing missing values (geom_point).

Process time in hours grouped by application type

gb1=eventlog %>%
  group_by(`X.case._ApplicationType`) %>% 
  throughput_time('log', units = 'hours')
gb1
## # A tibble: 3 x 10
##   X.case._Applica~     min    q1 median  mean    q3   max st_dev   iqr
##   <fct>              <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 New credit        0.0781  286.   492.  545.  762. 4034.   323.  476.
## 2 ""               NA        NA     NA   NaN    NA    NA  36773    NA 
## 3 Limit raise       0.0597  223.   336.  416.  549. 3038.   270.  326.
## # ... with 1 more variable: NA. <dbl>
plot(gb1)
## Warning: Removed 36773 rows containing non-finite values (stat_boxplot).

We can see that “New credit” application is used more than the rest with an average of 545 hours.

Process time in hours grouped by loan objective

gb2=eventlog %>%
  group_by(`X.case._creditGoal`) %>% 
  throughput_time('log', units = 'hours')
gb2
## # A tibble: 14 x 10
##    X.case._creditG~      min    q1 median  mean    q3   max st_dev   iqr
##    <fct>               <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
##  1 Existing credit~   0.326   318.   524.  576.  765. 3218.   322.  447.
##  2 Home improvement   0.174   305.   481.  543.  760. 2546.   300.  455.
##  3 Car                0.144   245.   411.  504.  758. 4034.   310.  513.
##  4 ""                NA        NA     NA   NaN    NA    NA  36773    NA 
##  5 Remaining debt ~   0.135   380.   732.  724.  884. 3501.   486.  504.
##  6 Not speficied      0.103   326.   619.  583.  782. 2547.   329.  456.
##  7 Unknown            0.0597  216.   358.  441.  732. 3191.   302.  516.
##  8 Caravan / Camper   0.174   226.   333.  445.  730. 2110.   304.  504.
##  9 Tax payments      51.4     296.   425.  545.  757. 2177.   365.  460.
## 10 Extra spending ~  17.8     266.   440.  506.  755. 1815.   280.  488.
## 11 Motorcycle        46.9     259.   391.  480.  759. 1338.   273.  500.
## 12 Boat              55.0     267.   428.  514.  747. 1536.   291.  480.
## 13 Business goal    134.      265.   591.  566.  758. 1245.   315.  494.
## 14 Debt restructur~ 732.      732.   732.  732.  732.  732.    NA     0 
## # ... with 1 more variable: NA. <dbl>
plot(gb2)
## Warning: Removed 36773 rows containing non-finite values (stat_boxplot).

###We can see that “Existing credit takeover” process time is the highest one rest with an average of 576 hours.